library(dplyr)
library(INLA)
library(ggplot2)
library(patchwork)
library(inlabru)
library(mapview)
# load some libraries to generate nice map plots
library(scico)Practical 5
Models for areal data
In this practical we are going to fit an areal model. We will:
Learn how to fit an areal model in
inlabruLearn how to add spatial covariates to the model
Learn how to do predictions
Learn how to simulate from the fitted model
Libraries to load:
The data
We consider data on respiratory hospitalizations for Greater Glasgow and Clyde in 2007. The data are available from the CARBayesdata R Package:
library(CARBayesdata)
data(pollutionhealthdata)
data(GGHB.IZ)The pollutionhealthdata contains the spatiotemporal data on respiratory hospitalizations, air pollution concentrations and socio-economic deprivation covariates for the 271 Intermediate Zones (IZ) that make up the Greater Glasgow and Clyde health board in Scotland. Data are provided by the Scottish Government and the available variables are:
IZ: unique identifier for each IZ.year: the year when the measurements were takenobserved: observed numbers of hospitalizations due to respiratory disease.expected: expected numbers of hospitalizations due to respiratory disease computed using indirect standardisation from Scotland-wide respiratory hospitalization rates.pm10: Average particulate matter (less than 10 microns) concentrations.jsa: The percentage of working age people who are in receipt of Job Seekers Allowanceprice: Average property price (divided by 100,000).
The GGHB.IZ data is a Simple Features (sf) object containing the spatial polygon information for the set of 271 Intermediate Zones (IZ), that make up of the Greater Glasgow and Clyde health board in Scotland ( Figure 1 ).
We first merge the two dataset and select only one year of data, compute the SME and plot the observed
resp_cases <- merge(GGHB.IZ %>%
mutate(space = 1:dim(GGHB.IZ)[1]),
pollutionhealthdata, by = "IZ") %>%
filter(year == 2007) %>%
mutate(SMR = observed/expected)
ggplot() + geom_sf(data = resp_cases, aes(fill = SMR)) + scale_fill_scico(direction = -1)Then we compute the adjacency matrix using the functions poly2nb() and nb2mat() in the spdep library. We then convert the adjacency matrix into the precision matrix \(\mathbf{Q}\) of the CAR model. Remember this matrix has, on the diagonal the number of e
library(spdep)
W.nb <- poly2nb(GGHB.IZ,queen = TRUE)
R <- nb2mat(W.nb, style = "B", zero.policy = TRUE)
diag = apply(R,1,sum)
Q = -R
diag(Q) = diagThe model
We fit a first model to the data where we consider a Poisson model for the observed cases.
Stage 1 Model for the response \[ y_i|\eta_i\sim\text{Poisson}(E_i\lambda_i) \] where \(E_i\) are the expected cases for area \(i\).
Stage 2 Latent field model \[ \eta_i = \text{log}(\lambda_i) = \beta_0 + \omega_i + z_i \] where
- \(\beta_0\) is a common intercept
- \(\mathbf{\omega} = (\omega_1, \dots, \omega_k)\) is a conditional Autorgressive model (CAR) with precision matrix \(\tau_1\mathbf{Q}\)
- \(\mathbf{z} = (z_1, \dots, z_k)\) is an unstrictured random effect with precision \(\tau_2\)
Stage 3 Hyperparameters
The hyperparameters of the model are \(\tau_1\) and \(\tau_2\)
NOTE In this case the linear predictor \(\eta\) consists of three components!!
After fitting the model we want to extract results.
We now look at the predictions over space.
Finally we want to compare our observations \(y_i\) with the predicted means of the Poisson distribution \(E_i\exp(\lambda_i)\)
pred$cases %>% ggplot() + geom_point(aes(observed, mean)) +
geom_errorbar(aes(observed, ymin = q0.025, ymax = q0.975)) +
geom_abline(intercept = 0, slope = 1)Note: Here we are predicting the mean of counts, not the counts!!! Predicting counts is the theme of the next task!
Getting prediction densities
Posterior predictive distributions, that is \(\pi(y_i^{\text{new}}|\mathbf{y})\) are of interest in many applied problems. The bru() function does not return predictive densities. In the previous step we have computed predictions for the expected counts \(\pi(E_i\lambda_i|\mathbf{y})\). The predictive distribution is then: \[
\pi(y_i^{\text{new}}|\mathbf{y}) = \int \pi(y_i|E_i\lambda_i)\pi(E_i\lambda_i|\mathbf{y})\ dE_i\lambda_i
\] where, in our case, \(\pi(y_i|E_i\lambda_i)\) is Poisson with mean \(E_i\lambda_i\). We can achieve this using the following algorith:
- Simulate \(n\) replicates of \(g^k = E_i\lambda_i\) for \(k = 1,\dots,n\) using the function generate() which takes the same input as predict()
- For each of the \(k\) replicates simulate a new value \(y_i^{new}\) using the function rpois()
- Summarise the \(n\) samples of \(y_i^{new}\) using, for example the mean and the 0.025 and 0.975 quantiles.
Here is the code:
# simulate 1000 realizations of E_i\lambda_i
expected_counts = generate(fit, resp_cases,
~ expected * exp(Intercept + space),
n.samples = 1000)
# simulate poisson data
aa = rpois(271*1000, lambda = as.vector(expected_counts))
sim_counts = matrix(aa, 271, 1000)
# summarise the samples with posterior means and quantiles
pred_counts = data.frame(observed = resp_cases$observed,
m = apply(sim_counts,1,mean),
q1 = apply(sim_counts,1,quantile, 0.025),
q2 = apply(sim_counts,1,quantile, 0.975),
vv = apply(sim_counts,1,var)
)